Identifying "aboutness topics": two annotation experiments
نویسندگان
چکیده
منابع مشابه
Identifying "aboutness topics": two annotation experiments
This paper deals with the annotation of “aboutness topic” (also known as “sentence topic”) in naturally occurring data. We report on two annotation experiments involving German newspaper texts: In experiment 1, based on the annotation guidelines by Götze et al. (2007), two expert annotators had to select the aboutness topic from among a small number of pre-defined choices, for a total of 588 se...
متن کاملUnderstanding Document Aboutness Step Two: Identifying Interesting Things
We define the notion of an interesting nugget in a document. Such nuggets attract a user's attention and lead them to explore more information around that nugget. In order to measure and model interestingness, we look at browsing sessions within Wikipedia and we build a data set of transitions (clickthrough) from a source Wikipedia page to a destination Wikipedia page through anchor clicks. We ...
متن کاملMore Reflections on "Aboutness" TREC-2001 Evaluation Experiments at Justsystem
The TREC-2001 Web track evaluation experiments at the Justsystem site are described with a focus on the “aboutness” based approach in text retrieval. In the web ad hoc task, our TREC-9 approach is adopted again, combining both pseudo-relevance feedback and reference database feedback but the setting is calibrated for an early precision preferred search. For the entry page finding task, we combi...
متن کاملIdentifying Topics by Position
This paper addresses the problem of identifying likely topics of texts by their position in the text. It describes the automated training and evaluation of an Optimal Position Policy, a method of locating the likely positions of topic-bearing sentences based on genre-specific regularities of discourse structure. This method can be used in applications such as information retrieval, routing, and...
متن کاملUnderstanding Document Aboutness - Step One: Identifying Salient Entities
We propose a system that determines the salience of entities within web documents. Many recent advances in commercial search engines leverage the identification of entities in web pages. However, for many pages, only a small subset of entities are important, or central, to the document, which can lead to degraded relevance for entity triggered experiences. We address this problem by devising a ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Dialogue & Discourse
سال: 2013
ISSN: 2152-9620
DOI: 10.5087/dad.2013.206